pacman::p_load(sf, sfdep, tmap,
tidyverse, knitr, patchwork,
leaflet, scales)
# - Creates a package list containing the necessary R packages
# - Checks if the R packages in the package list have been installed
# - If not installed, will install the missing packages & launch into R environment.Take-home Exercise 1: Geospatial Analytics for Public Good
1 Overview
1.1 Background
As urban infrastructures, including public transportation systems like buses, taxis, mass rapid transit, public utilities, and roads, become increasingly digitised, the data generated becomes a valuable resource for tracking the movements of people and vehicles over space and time. This transformation has been facilitated by pervasive computing technologies such as Global Positioning System (GPS) and Radio Frequency Identification (RFID) tags on vehicles. An example of this is the collection of data on bus routes and ridership, amassed from the use of smart cards and GPS devices available on public buses.
The data collected from these sources is likely to contain valuable patterns that offer insights into various aspects of human movement and behavior within a city. Analyzing and comparing these patterns can provide a deeper understanding of urban mobility. Such insights can be instrumental in improving urban management and can also serve as valuable information for both public and private sector stakeholders involved in urban transportation services. This information can aid them in making informed decisions and gaining a competitive edge in their respective domains.
1.2 Objectives
The objective of this study is to utilize appropriate Local Indicators of Spatial Association (LISA) techniques to reveal spatial mobility patterns among public bus passengers in Singapore.
This will include the following tasks:
- Geovisualisation and Analysis:
- Compute the passenger trips generated by origin at the hexagon level
- Display the geographical distribution of the passenger trips
- Explore spatial patterns revealed by the geovisualisation
| Peak hour period | Bus tap on time |
|---|---|
| Weekday morning peak | 6am to 9am |
| Weekday afternoon peak | 5pm to 8pm |
| Weekend/holiday morning peak | 11am to 2pm |
| Weekend/holiday evening peak | 4pm to 7pm |
- Local Indicators of Spatial Association (LISA) Analysis
- Compute LISA of the passenger trips generate by origin
- Display and draw statistical conclusions of LISA maps
2 Loading Packages
The following packages will be used for this exercise:
| Package | Description |
|---|---|
| sf | For importing, managing, and handling geospatial data |
| sfdep | Used to compute spatial weights, global and local spatial autocorrelation statistics |
| tmap | For thematic mapping |
| tidyverse | For non-spatial data wrangling |
| knitr | For dynamic report generation |
| patchwork | For plot manipulation |
| leaflet | For interactive maps |
| scales | For scaling graphs |
The code chunk below, using p_load function of the pacman package, ensures that packages required are installed and loaded in R.
3 Data Preparation
3.1 The Data
The following data are used for this study:
- Aspatial:
- Passenger Volume by Origin Destination Bus Stops for August 2023, downloaded from LTA DataMall using API.
- Geospatial
- Bus Stop Location from LTA DataMall. It provides information about all the bus stops currently being serviced by buses, including the bus stop code (identifier) and location coordinates.
- hexagon, a hexagon layer of 250m to replace the relative coarse and irregular Master Plan 2019 Planning Sub-zone GIS data set of URA.
3.2 Import & Preparation
3.2.1 Aspatial
We will be importing the Passenger Volume by Origin Destination Bus Stops dataset for August 2023, downloaded from LTA DataMall by using read_csv() or readr package.
odbus08 <- read_csv("data/aspatial/origin_destination_bus_202308.csv")(a) Attributes
glimpse() of the dplyr package allows us to see all columns and their data type in the data frame.
glimpse(odbus08)Rows: 5,709,512
Columns: 7
$ YEAR_MONTH <chr> "2023-08", "2023-08", "2023-08", "2023-08", "2023-…
$ DAY_TYPE <chr> "WEEKDAY", "WEEKENDS/HOLIDAY", "WEEKENDS/HOLIDAY",…
$ TIME_PER_HOUR <dbl> 16, 16, 14, 14, 17, 17, 17, 17, 7, 17, 14, 10, 10,…
$ PT_TYPE <chr> "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "…
$ ORIGIN_PT_CODE <chr> "04168", "04168", "80119", "80119", "44069", "4406…
$ DESTINATION_PT_CODE <chr> "10051", "10051", "90079", "90079", "17229", "1722…
$ TOTAL_TRIPS <dbl> 7, 2, 3, 10, 5, 4, 3, 22, 3, 3, 7, 1, 3, 1, 3, 1, …
Insights:
- There are 7 variables in the odbus08 tibble data, they are:
- YEAR_MONTH: Month in which data is collected
- DAY_TYPE: Weekdays or weekends/holidays
- TIME_PER_HOUR: Hour which the passenger trip is based on, in intervals from 0 to 23 hours
- PT_TYPE: Type of public transport, i.e. bus
- ORIGIN_PT_CODE: Origin bus stop ID
- DESTINATION_PT_CODE: Destination bus stop ID
- TOTAL_TRIPS: Number of trips
- We also note that values in ORIGIN_PT_CODE and DESTINATON_PT_CODE are in numeric data type. These should be in factor data type for further processing and georeferencing.
(b) Unique Bus Stops
n_distinct() of the dplyr package allows us to count the unique originating bus stops in the data set.
n_distinct(odbus08$ORIGIN_PT_CODE)[1] 5067
The results reveal that there are 5067 distinct origin bus stops.
(a) Convert Data Type
as.factor() can be used to convert the variables ORIGIN_PT_CODE and DESTINATON_PT_CODE from numeric to categorical data type. We use glimpse() again to check the results.
odbus08$ORIGIN_PT_CODE <- as.factor(odbus08$ORIGIN_PT_CODE)
odbus08$DESTINATION_PT_CODE <- as.factor(odbus08$DESTINATION_PT_CODE)
glimpse(odbus08)Rows: 5,709,512
Columns: 7
$ YEAR_MONTH <chr> "2023-08", "2023-08", "2023-08", "2023-08", "2023-…
$ DAY_TYPE <chr> "WEEKDAY", "WEEKENDS/HOLIDAY", "WEEKENDS/HOLIDAY",…
$ TIME_PER_HOUR <dbl> 16, 16, 14, 14, 17, 17, 17, 17, 7, 17, 14, 10, 10,…
$ PT_TYPE <chr> "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "…
$ ORIGIN_PT_CODE <fct> 04168, 04168, 80119, 80119, 44069, 44069, 20281, 2…
$ DESTINATION_PT_CODE <fct> 10051, 10051, 90079, 90079, 17229, 17229, 20141, 2…
$ TOTAL_TRIPS <dbl> 7, 2, 3, 10, 5, 4, 3, 22, 3, 3, 7, 1, 3, 1, 3, 1, …
Note that both of them are in factor data type now.
(b) Duplicates Check
Before moving on to the next step, it is a good practice for us to check for duplicated records to prevent double counting of passenger trips.
duplicate <- odbus08 %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()
duplicate# A tibble: 0 × 7
# ℹ 7 variables: YEAR_MONTH <chr>, DAY_TYPE <chr>, TIME_PER_HOUR <dbl>,
# PT_TYPE <chr>, ORIGIN_PT_CODE <fct>, DESTINATION_PT_CODE <fct>,
# TOTAL_TRIPS <dbl>
Results confirm that there are no duplicated records found.
(c) Extracting the Study Data
In our study, we would like to know patterns for 4 peak hour periods. Therefore, we can create a new variable period using the ifelse() that states whether an observation occurred during peak period using the code chunk below.
peak <- odbus08 %>%
# Weekday morning peak
mutate(period= ifelse(DAY_TYPE=="WEEKDAY" & (TIME_PER_HOUR >= 6 & TIME_PER_HOUR <= 9), "WDM",
# Weekday afternoon peak
ifelse(DAY_TYPE=="WEEKDAY" & (TIME_PER_HOUR >= 17 & TIME_PER_HOUR <= 20), "WDA",
# Weekend/holiday morning peak
ifelse(DAY_TYPE=="WEEKENDS/HOLIDAY" & (TIME_PER_HOUR >= 11 & TIME_PER_HOUR <= 14), "WEM",
# Weekend/holiday evening peak
ifelse(DAY_TYPE=="WEEKENDS/HOLIDAY" & (TIME_PER_HOUR >= 16 & TIME_PER_HOUR <= 19), "WEE",
# Return off-peak if neither of the peak hour periods
"Off-peak")))))We can then filter for peak-period data using the newly created period column and aggregate the total trips for each origin bus stop during peak period.
peakperiods <- peak %>%
# filter helps to keep records that occurred during period periods
filter(period !="Off-peak") %>%
# aggregate the total passenger trips for each origin bus stop
group_by(period, ORIGIN_PT_CODE) %>%
summarise(TRIPS=sum(TOTAL_TRIPS))Let’s visualise the proportions of passenger volumes for each peak period.
Show the code
freq<- ggplot(data=peakperiods,
aes(x=period,y=TRIPS))+
geom_bar(stat="identity") +
theme(legend.position="none")+
labs(title = "Frequency of Trip for each Peak Period",
x = "Peak Period",
y = "Frequency")
freq + scale_y_continuous(labels=label_comma()) + scale_x_discrete(labels=c("WDA"="Weekday Afternoon", "WDM"= "Weekday Morning", "WEE"= "Weekend Evening", "WEM"= "Weekend Morning"))
We can see that passenger volume on weekdays are much higher than over the weekends/holidays.
Transpose each peak period period as a columns using pivot_wider() of tidyr package will allow us to create further variables at a bus stop level. We replace NA values with 0 to reflect when there are no traffic for certain periods.
peakperiods_wide <- pivot_wider(peakperiods,
names_from = "period",
values_from = "TRIPS")
peakperiods_wide["WDA"][is.na(peakperiods_wide["WDA"])] <- 0
peakperiods_wide["WDM"][is.na(peakperiods_wide["WDM"])] <- 0
peakperiods_wide["WEE"][is.na(peakperiods_wide["WEE"])] <- 0
peakperiods_wide["WEM"][is.na(peakperiods_wide["WEM"])] <- 0Each peak period has its own column, as seen using the glimpse() function of dplyr.
glimpse(peakperiods_wide)Rows: 5,067
Columns: 5
$ ORIGIN_PT_CODE <fct> 01012, 01013, 01019, 01029, 01039, 01059, 01109, 01112,…
$ WDA <dbl> 8448, 7328, 3608, 9317, 12937, 2133, 322, 45010, 27233,…
$ WDM <dbl> 1973, 952, 1789, 2561, 2938, 1651, 161, 8492, 9015, 424…
$ WEE <dbl> 3208, 2796, 1623, 4244, 7403, 1190, 88, 21706, 11927, 6…
$ WEM <dbl> 2273, 1697, 1511, 3272, 5424, 1062, 89, 14964, 8278, 61…
3.2.2 Geospatial
(a) Bus Stop Shapefile
In this section, we import BusStop shapefile into RStudio using st_read() function of sf package. This data provides the locations of all bus stops as at Q2 of 2023. crs = 3414 ensures coordinate reference system (CRS) is 3414, which is the EPSG code for the SVY21 projection used in Singapore.
busstop <- st_read(dsn = "data/geospatial",
layer = "BusStop") %>%
st_transform(crs = 3414)Reading layer `BusStop' from data source
`C:\kytjy\ISSS624\Take-Home_Ex\Take-Home_Ex1\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 5161 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21
The imported shape file is simple features object of sf. From the output, we can see that there are 5161 points with 3 fields, and confirm that the datum SVY21 is correct.
Recall that there are 5067 origin bus stops from the peakperiods_wide table, compared to the 5161 bus stops from LTA’s BusStop shape file. This could be due to timing difference – LTA’s BusStop shapefile is as of July 2023, while peakperiod is based on Aug 2023.
qtm() allows us to quickly map the points to visualise the results.
qtm(busstop)
Note that there are 5 bus stops located outside Singapore, they are bus stops 46239, 46609, 47701, 46211, and 46219.
(b) Hexagon Layer
A hexagonal grid is used to replace the relative coarse and irregular Master Plan 2019 Planning Sub-zone GIS data set of URA. Hexagons have a number of advantages over these other shapes:
- The distance between the centroid of a hexagon to all neighboring centroids is the same in all directions.
- The lack of acute angles in a regular hexagon means that no areas of the shape are outliers in any direction.
- All neighboring hexagons have the same spatial relationship with the central hexagon, making spatial querying and joining a more straightforward process.
- Unlike square-based grids, the geometry of hexagons are well-structured to represent curves of geographic features which are rarely perpendicular in shape, such as rivers and roads.
- The “softer” shape of a hexagon compared to a square means it performs better at representing gradual spatial changes.
We first create a hexagonal grid layer of 250m (refers to the perpendicular distance between the centre of the hexagon and its edges) with st_make_grid, and st_sf to convert the grid into an sf object with the codes below.
st_make_grid function is used to create a grid over a spatial object. It takes 4 arguments, they are:
x: sf object; the input spatial data
cellsize: for hexagonal cells the distance between opposite edges in the unit of the crs the spatial data is using. In this case, we take cellsize to be 250m * 2 = 500m
- what: character; one of:
"polygons","corners", or"centers" - square: indicates whether you are a square grid (TRUE) or hexagon grid (FALSE)
area_hexagon_grid = st_make_grid(busstop, 500, what = "polygons", square = FALSE)Next, st_sf converts the grid created to sf object while lengths() of Base R is used to calculate the number of grids created.
# Converts grid to sf
hexagon_grid_sf = st_sf(area_hexagon_grid) %>%
# Assign unique ID to each grid
mutate(grid_id = 1:length(lengths(area_hexagon_grid)))We count the number of bus stops in each grid and keep grids with bus stops using the code chunks below.
# Create a column containing the count of bus stops in each grid
hexagon_grid_sf$busstops = lengths(st_intersects(hexagon_grid_sf, busstop))
# Remove if no bus stop in side that grid, ie only keep hexagons with bus stops
hexagon_w_busstops = filter(hexagon_grid_sf, busstops > 0)Let’s confirm that all bus stops have been accounted for in our hexagon layer.
sum(hexagon_w_busstops$busstops)[1] 5161
This is in line with the 5161 points of the busstop shapefile.
Lastly, using tm_shape of tmap, we can quickly visualise the results of the hexagon grids we have created.
Show the code
tmap_mode ("plot")
tm_shape(hexagon_w_busstops)+
tm_fill(
col = "busstops",
palette = "Blues",
style = "cont",
title = "Number of Bus Stops",
id = "grid_id",
showNA = FALSE,
alpha = 0.6,
popup.format = list(
grid_id = list(format = "f", digits = 0)
)
)+
tm_borders(col = "grey40", lwd = 0.7)
(a) Combining Busstop and hexagon layer
Code chunk below populates the grid ID (i.e. grid_id) of hexagon_w_busstops sf data frame into busstop sf data frame.
bs_wgrids <- st_intersection(busstop, hexagon_w_busstops) %>%
select(BUS_STOP_N,BUS_ROOF_N,LOC_DESC, grid_id) %>%
st_drop_geometryst_intersection()is used to perform point and polygon overly and the output will be in point sf object.select()of dplyr package is then use to retain preferred variables from the data frames.st_stop_geometry()removes the geometry data to manipulate it like a regular dataframe usingtidyranddplyrfunctions
Before we proceed, let’s perform a duplicates check on bs_wgrids.
duplicate2 <- bs_wgrids %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()
duplicate2# A tibble: 8 × 4
BUS_STOP_N BUS_ROOF_N LOC_DESC grid_id
<chr> <chr> <chr> <int>
1 43709 B06 BLK 644 1904
2 43709 B06 BLK 644 1904
3 58031 UNK OPP CANBERRA DR 2939
4 58031 UNK OPP CANBERRA DR 2939
5 51071 B21 MACRITCHIE RESERVOIR 3081
6 51071 B21 MACRITCHIE RESERVOIR 3081
7 97079 B14 OPP ST. JOHN'S CRES 5037
8 97079 B14 OPP ST. JOHN'S CRES 5037
Results displayed 4 seemingly genuine duplicated records, with same bus stop number, roof, and location description. We remove these to prevent double-counting.
The code chunk below helps retain unique records.
bs_wgrids <- unique(bs_wgrids)(b) Populate PeakPeriods with Grid Details
We can now append the grid ID from bs_wgrids data frame onto peakperiods_wide data frame using left_join. Recall we previously identified 5 bus stops outside Singapore, filter() allows us to exclude the 5 outside Singapore. LOC_DESC = first(LOC_DESC) geta the first location description for each grid_id group, this allows us to have a rough reference of the name of the bus stop when visualising later.
origin_grid <- left_join(peakperiods_wide, bs_wgrids,
by = c("ORIGIN_PT_CODE" = "BUS_STOP_N")) %>%
rename(ORIGIN_BS = ORIGIN_PT_CODE) %>%
filter(!ORIGIN_BS %in% c(46239, 46609, 47701, 46211, 46219)) %>% # retains SG bus stops
group_by(grid_id) %>%
summarise(
LOC_DESC = first(LOC_DESC),
WDA = sum(WDA),
WDM = sum(WDM),
WEM = sum(WEM),
WEE = sum(WEE)
) %>%
filter(!is.na(grid_id))
glimpse(origin_grid)Rows: 1,503
Columns: 6
$ grid_id <int> 34, 65, 99, 127, 129, 130, 131, 159, 160, 161, 163, 190, 191,…
$ LOC_DESC <chr> "AFT TUAS STH BLVD", "BEF TUAS STH AVE 14", "YONG NAM", "OPP …
$ WDA <dbl> 417, 110, 249, 1810, 2985, 220, 325, 318, 196, 2633, 173, 555…
$ WDM <dbl> 62, 50, 44, 155, 1201, 73, 82, 59, 123, 773, 407, 183, 168, 2…
$ WEM <dbl> 5, 24, 27, 148, 512, 75, 43, 23, 39, 556, 47, 76, 58, 1377, 3…
$ WEE <dbl> 65, 26, 54, 303, 637, 35, 49, 43, 141, 981, 162, 124, 79, 138…
Show the code
# Weekday Morning
# Extract column
distWDM <- origin_grid$WDM
# Calculate mean
distWDM_mean <- mean(distWDM)
plot_distWDM <- ggplot(
data = data.frame(distWDM),
aes(x = distWDM)
) +
geom_histogram(
bins = 20,
color = "#FFFCF9",
fill = "black",
alpha = .3
) +
# Add line for mean
geom_vline(
xintercept = distWDM_mean,
color = "#595DE5",
linetype = "dashed",
linewidth = 1
) +
# Add line annotations
annotate(
"text",
x = 55000,
y = 1200,
label = paste("Mean =", round(distWDM_mean, 3)),
color = "#595DE5",
size = 3
) +
labs(
title = "Weekday Morning Peak",
x = "Bus Trips",
y = "Frequency"
) +
theme(
axis.text.x = element_text(angle = 45, hjust=1)
) +
scale_x_continuous(labels = label_number(), n.breaks=8)
# Weekday Afternoon
# Extract column
distWDA <- origin_grid$WDA
# Calculate mean
distWDA_mean <- mean(distWDA)
plot_distWDA <- ggplot(
data = data.frame(distWDA),
aes(x = distWDA)
) +
geom_histogram(
bins = 20,
color = "#FFFCF9",
fill = "black",
alpha = .3
) +
# Add line for mean
geom_vline(
xintercept = distWDA_mean,
color = "#595DE5",
linetype = "dashed",
linewidth = 1
) +
# Add line annotations
annotate(
"text",
x = 140000,
y = 1200,
label = paste("Mean =", round(distWDA_mean, 3)),
color = "#595DE5",
size = 3
) +
labs(
title = "Weekday Afternoon Peak",
x = "Bus Trips",
y = "Frequency"
) +
theme(
axis.text.x = element_text(angle = 45, hjust=1)
) +
scale_x_continuous(labels = label_number(), n.breaks=8)
# Weekend Morning
# Extract column
distWEM <- origin_grid$WEM
# Calculate mean
distWEM_mean <- mean(distWEM)
plot_distWEM <- ggplot(
data = data.frame(distWEM),
aes(x = distWEM)
) +
geom_histogram(
bins = 20,
color = "#FFFCF9",
fill = "black",
alpha = .3
) +
# Add line for mean
geom_vline(
xintercept = distWEM_mean,
color = "#595DE5",
linetype = "dashed",
linewidth = 1
) +
# Add line annotations
annotate(
"text",
x = 33000,
y = 1200,
label = paste("Mean =", round(distWEM_mean, 3)),
color = "#595DE5",
size = 3
) +
labs(
title = "Weekdend/Holiday Morning Peak",
x = "Bus Trips",
y = "Frequency"
) +
theme(
axis.text.x = element_text(angle = 45, hjust=1)
) +
scale_x_continuous(labels = label_number(), n.breaks=8)
# Weekend Evening
# Extract column
distWEE <- origin_grid$WEE
# Calculate mean
distWEE_mean <- mean(distWEE)
plot_distWEE <- ggplot(
data = data.frame(distWEE),
aes(x = distWEE)
) +
geom_histogram(
bins = 20,
color = "#FFFCF9",
fill = "black",
alpha = .3
) +
# Add line for mean
geom_vline(
xintercept = distWEE_mean,
color = "#595DE5",
linetype = "dashed",
linewidth = 1
) +
# Add line annotations
annotate(
"text",
x = 39000,
y = 1200,
label = paste("Mean =", round(distWEE_mean, 3)),
color = "#595DE5",
size = 3
) +
labs(
title = "Weekend/Holiday Evening Peak",
x = "Bus Trips",
y = "Frequency"
) +
theme(
axis.text.x = element_text(angle = 45, hjust=1)
) +
scale_x_continuous(labels = label_number(), n.breaks=8)
(plot_distWDM | plot_distWDA)/
(plot_distWEM | plot_distWEE)
The distribution of passenger trips for the 4 peak periods at a hexagon level appear to be highly skewed to the right. Rescaling our data using log transformation can greatly reduce the skewness.
origin_grid_wider <- origin_grid %>%
mutate(logWDM = ifelse(WDM == 0, 0, log(WDM)),
logWDA = ifelse(WDA == 0, 0, log(WDA)),
logWEM = ifelse(WEM == 0, 0, log(WEM)),
logWEE = ifelse(WEE == 0, 0, log(WEE)))Let’s visualise the distribution of the 4 peak periods again.
Show the code
# Weekday Morning
# Extract column
distlogWDM <- origin_grid_wider$logWDM
# Calculate mean
distlogWDM_mean <- mean(distlogWDM)
plot_distlogWDM <- ggplot(
data = data.frame(distlogWDM),
aes(x = distlogWDM)
) +
geom_histogram(
bins = 20,
color = "#FFFCF9",
fill = "#34414E",
alpha = .3
) +
# Add line for mean
geom_vline(
xintercept = distlogWDM_mean,
color = "#595DE5",
linetype = "dashed",
linewidth = 1
) +
# Add line annotations
annotate(
"text",
x = 12,
y = 300,
label = paste("Mean =", round(distlogWDM_mean, 3)),
color = "#595DE5",
size = 3
) +
labs(
title = "Weekday Morning Peak",
x = "Bus Trips",
y = "Frequency"
) +
theme(
axis.text.x = element_text(angle = 45, hjust=1)
)
# Weekday Afternoon
# Extract column
distlogWDA <- origin_grid_wider$logWDA
# Calculate mean
distlogWDA_mean <- mean(distlogWDA)
plot_distlogWDA <- ggplot(
data = data.frame(distlogWDA),
aes(x = distlogWDA)
) +
geom_histogram(
bins = 20,
color = "#FFFCF9",
fill = "#34414E",
alpha = .3
) +
# Add line for mean
geom_vline(
xintercept = distlogWDA_mean,
color = "#595DE5",
linetype = "dashed",
linewidth = 1
) +
# Add line annotations
annotate(
"text",
x = 12,
y = 300,
label = paste("Mean =", round(distlogWDA_mean, 3)),
color = "#595DE5",
size = 3
) +
labs(
title = "Weekday Afternoon Peak",
x = "Bus Trips",
y = "Frequency"
) +
theme(
axis.text.x = element_text(angle = 45, hjust=1)
)
# Weekend Morning
# Extract column
distlogWEM <- origin_grid_wider$logWEM
# Calculate mean
distlogWEM_mean <- mean(distlogWEM)
plot_distlogWEM <- ggplot(
data = data.frame(distlogWEM),
aes(x = distlogWEM)
) +
geom_histogram(
bins = 20,
color = "#FFFCF9",
fill = "#34414E",
alpha = .3
) +
# Add line for mean
geom_vline(
xintercept = distlogWEM_mean,
color = "#595DE5",
linetype = "dashed",
linewidth = 1
) +
# Add line annotations
annotate(
"text",
x = 10,
y = 300,
label = paste("Mean =", round(distlogWEM_mean, 3)),
color = "#595DE5",
size = 3
) +
labs(
title = "Weekdend/Holiday Morning Peak",
x = "Bus Trips",
y = "Frequency"
) +
theme(
axis.text.x = element_text(angle = 45, hjust=1)
)
# Weekend Evening
# Extract column
distlogWEE <- origin_grid_wider$logWEE
# Calculate mean
distlogWEE_mean <- mean(distlogWEE)
plot_distlogWEE <- ggplot(
data = data.frame(distlogWEE),
aes(x = distlogWEE)
) +
geom_histogram(
bins = 20,
color = "#FFFCF9",
fill = "#34414E",
alpha = .3
) +
# Add line for mean
geom_vline(
xintercept = distlogWEE_mean,
color = "#595DE5",
linetype = "dashed",
linewidth = 1
) +
# Add line annotations
annotate(
"text",
x = 10,
y = 300,
label = paste("Mean =", round(distlogWEE_mean, 3)),
color = "#595DE5",
size = 3
) +
labs(
title = "Weekend/Holiday Evening Peak",
x = "Bus Trips",
y = "Frequency"
) +
theme(
axis.text.x = element_text(angle = 45, hjust=1)
)
(plot_distlogWDM | plot_distlogWDA)/
(plot_distlogWEM | plot_distlogWEE)
origin_gridwgeom <- inner_join(hexagon_w_busstops,
origin_grid_wider,
by = "grid_id")In order to retain the geospatial properties, the left data frame must the sf data.frame (i.e. hexagon_w_busstop).
4 Geovisualisation & Analysis
4.1 Data Classification
Different classification schemes highlight areas with the highest and/or lowest values, while others create classes that result in a more uniform distribution of colors. When data is sharply skewed or has extreme outliers, it’s important to consider whether the goal is to emphasize those areas or to achieve a more even distribution of colors and sizes.
The main methods of data classification are:
Quantile: each class contains an equal number of features. It assigns the same number of data values to each class. There are no empty classes or classes with too few or too many values
Jenks/Natural breaks: seeks clumps of values that are clustered together in order to form categories that may reflect meaningful groupings of areas
Equal: divides the range of attribute values into equal-sized sub-ranges
Since our variable is less skewed after log transformation, we can explore various classification methods for visualization. This approach may reveal interesting insights that were not immediately apparent before.
4.2 Visualising Bus Ridership
Show the code
tmap_mode ("plot")
plotlogWDM_q <-
tm_shape(origin_gridwgeom) +
tm_fill("logWDM",
style = "quantile",
palette = "Blues",
title = "Total passenger trips",
alpha=0.6) +
tm_borders(alpha = 0.5)+
tm_layout(main.title="Quantile Classification",
main.title.size = 0.8,
main.title.position = "center",
legend.height = 0.35,
legend.width = 0.25,
frame = FALSE)
plotlogWDM_e <-
tm_shape(origin_gridwgeom) +
tm_fill("logWDM",
style = "equal",
palette = "Blues",
title = "Total passenger trips",
alpha=0.6,
id="LOC_DESC") +
tm_borders(alpha = 0.5)+
tm_layout(main.title="Equal Interval Classification",
main.title.size = 0.8,
main.title.position = "center",
legend.height = 0.35,
legend.width = 0.25,
frame = FALSE)
plotlogWDM_j <-
tm_shape(origin_gridwgeom) +
tm_fill("logWDM",
style = "jenks",
palette = "Blues",
title = "Total passenger trips",
alpha=0.6,
id="LOC_DESC") +
tm_borders(alpha = 0.5)+
tm_layout(main.title="Jenks Classification",
main.title.size = 0.8,
main.title.position = "center",
legend.height = 0.35,
legend.width = 0.25,
frame = FALSE)
tmap_arrange(plotlogWDM_q, plotlogWDM_e, plotlogWDM_j, asp=1, ncol=3)
The leftmost map above is partitioned using the quantile intervals. We can observe that the bus trips are unevenly distributed across Singapore. There are lighter shares of blue (indicating lower levels of ridership) originating from the edges of the country, particularly in the West, while higher levels of ridership in the North region are indicated by the darker shades of blue.
Bus stops nearer to the residential estates appeared to be popular during the weekday morning peak period. This is likely due to a large number of people commuting from home to their workplaces/schools on weekday mornings.
Higher passenger traffic were noted at the bus stops nearer to MRT stations such as Harbourfront Station, Farrer Road Station, Yio Chu Kang Station, and Admirality Station. A possible contributing factor could be the people who are transiting from taking the MRTs to buses to get to their destinations.
Lastly, Woodlands Checkpoint also demonstrated higher ridership. Woodlands Checkpoint is one of Singapore’s two land border checkpoints, facilitating the movement of ground traffic between Singapore and Malaysia. This could potentially be due to the people commuting across the causeway between Malaysia and Singapore.
The map using equal intervals provided similar insights. We noted that the bus stop located near to MRT stations had higher levels of ridership. In particular, more trips originated from Tiong Bahru Station, Buona Vista Station, Tanah Merah Station, Admiralty Station, Harbourfront, and Woodleigh Station. Bus interchanges also appeared to be popular origins, i.e. Bukit Panjang Interchange and Joo Koon Interchange. In general, more homogeneity is noted using the equal interval – the contrast between hexagon to hexagon is less obvious.
Using Jenk’s partitioning method, the results were largely similar to the other two types of interval classes. Higher bus ridership were spotted at bus stops within close proximity to MRT stations (Kranji Station, Buona Vista Station, Buangkok Station, Ranggung Station, Farrer Road Station, Stevens Station, Bedok Reservoir Station) and residential estates (Sunshine Place near Tengah, BLK 109 in Bedok, BLK 477A in Sengkang, Bef. BLK 629A in Woodlands, to name a few).
Show the code
tmap_mode ("plot")
plotlogWDA_q <-
tm_shape(origin_gridwgeom) +
tm_fill("logWDA",
style = "quantile",
palette = "Reds",
title = "Total passenger trips",
alpha=0.6) +
tm_borders(alpha = 0.5)+
tm_layout(main.title="Quantile Classification",
main.title.size = 0.8,
main.title.position = "center",
legend.height = 0.35,
legend.width = 0.25,
frame = FALSE)
plotlogWDA_e <-
tm_shape(origin_gridwgeom) +
tm_fill("logWDA",
style = "equal",
palette = "Reds",
title = "Total passenger trips",
alpha=0.6,
id="LOC_DESC") +
tm_borders(alpha = 0.5)+
tm_layout(main.title="Equal Interval Classification",
main.title.size = 0.8,
main.title.position = "center",
legend.height = 0.35,
legend.width = 0.25,
frame = FALSE)
plotlogWDA_j <-
tm_shape(origin_gridwgeom) +
tm_fill("logWDA",
style = "jenks",
palette = "Reds",
title = "Total passenger trips",
alpha=0.6,
id="LOC_DESC") +
tm_borders(alpha = 0.5)+
tm_layout(main.title="Jenks Classification",
main.title.size = 0.8,
main.title.position = "center",
legend.height = 0.35,
legend.width = 0.25,
frame = FALSE)
tmap_arrange(plotlogWDA_q, plotlogWDA_e, plotlogWDA_j, asp=1, ncol=3)
A look at the weekday afternoon ridership using the quantile classification yielded the following insights.
- Ridership from Woodlands Checkpoint remained high.
- Bus stops in close proximity to MRT stations and popular bus stops in residential estates continued to have high passenger volumes.
- More trips originating from institutional areas: Opposite Ngee Ann Poly, Temasek Poly, NIE BLK 2, School of the Arts
- More trips originating from industrial buildings/business parks: North Link Bldg, Aft Senoko Way, Mapletree Business City, Woodlands Auto Hub, Opp Airline Hse, etc.
- More trips originating from hospitals: Yishun Community Hospital, Changi General Hospital
- Seletar Camp also looked to have high passenger levels
- The far West seemed to experience low ridership other than the bus stop opposite Tuas Link Station.
- Southern part of Singapore, consisting of more commercial areas, appeared to be more clustered as illustrated by the density of the darker red hexagons, compared to the weekday morning peak period.
Using the Equal Interval classification, there were notably higher concentration of passengers who boarded the bus at Serangoon Station, Harbourfront/VivoCity, Tiong Bahru Station, Admiralty Station, and Punggol Station during weekday afternoons according to the equal interval classification method. Similar to the map for weekday morning peak period, the equal interval seemed to produce more homogeneous classifications.
Jenk’s classification delivered similar insights to the quantile classification, where the higher concentration of ridership can be observed in the Southern downtown areas. It also highlighted Opp Airline Hse in the far East as a bus stop with high ridership, something not as visible using the equal interval method. Alternative methods of commute might be more popular in the West and North-West regions illustrated by the lighter shades of red hexagons.
Show the code
tmap_mode ("plot")
plotlogWEM_q <-
tm_shape(origin_gridwgeom) +
tm_fill("logWEM",
style = "quantile",
palette = "Blues",
title = "Total passenger trips",
alpha=0.6) +
tm_borders(alpha = 0.5)+
tm_layout(main.title="Quantile Classification",
main.title.size = 0.8,
main.title.position = "center",
legend.height = 0.35,
legend.width = 0.25,
frame = FALSE)
plotlogWEM_e <-
tm_shape(origin_gridwgeom) +
tm_fill("logWEM",
style = "equal",
palette = "Blues",
title = "Total passenger trips",
alpha=0.6,
id="LOC_DESC") +
tm_borders(alpha = 0.5)+
tm_layout(main.title="Equal Interval Classification",
main.title.size = 0.8,
main.title.position = "center",
legend.height = 0.35,
legend.width = 0.25,
frame = FALSE)
plotlogWEM_j <-
tm_shape(origin_gridwgeom) +
tm_fill("logWEM",
style = "jenks",
palette = "Blues",
title = "Total passenger trips",
alpha=0.6,
id="LOC_DESC") +
tm_borders(alpha = 0.5)+
tm_layout(main.title="Jenks Classification",
main.title.size = 0.8,
main.title.position = "center",
legend.height = 0.35,
legend.width = 0.25,
frame = FALSE)
tmap_arrange(plotlogWEM_q, plotlogWEM_e, plotlogWEM_j, asp=1, ncol=3)
Generally, the distribution of bus ridership looks varied across the island.
The map partitioned using the quantile intervals showed that during the weekend/holiday peak period, the ridership for far West region remained relatively low. Interestingly, the bus trips recorded from Seletar area appeared to have dipped compared to the weekday peak periods. This suggests that buses in these industrial areas could be oriented towards work-related travel, thus less common on weekends.
Bus stops nearer to housing estates, shopping malls, and Woodlands Checkpoint demonstrated higher levels of weekend morning ridership. The bus stops with the highest boarding passengers are Sunshine Place, Opp BLK 271, BLK 252, Aft. Hasanah Mosque, Buona Vista Station, Harbourfront/Vivocity, Admiralty Station, BLK 555, Bedok Reservoir Station, BLK 22, BLK 109.
The map using interval classification looked more monochromatic, there is less contrast with a more uniform appearance. The following bus stops have been identified to have the highest ridership during weekend/holiday morning peak period: Harbourfront/Vivocity, Tiong Bahru Station, Orchard Station/Lucky Plaza, Admirality Station, Aft. Punggol Road.
Jenk’s method identified more hot spots across the island than the other two methods.
Show the code
tmap_mode ("plot")
plotlogWEE_q <-
tm_shape(origin_gridwgeom) +
tm_fill("logWEE",
style = "quantile",
palette = "Reds",
title = "Total passenger trips",
alpha=0.6) +
tm_borders(alpha = 0.5)+
tm_layout(main.title="Quantile Classification",
main.title.size = 0.8,
main.title.position = "center",
legend.height = 0.35,
legend.width = 0.25,
frame = FALSE)
plotlogWEE_e <-
tm_shape(origin_gridwgeom) +
tm_fill("logWEE",
style = "equal",
palette = "Reds",
title = "Total passenger trips",
alpha=0.6,
id="LOC_DESC") +
tm_borders(alpha = 0.5)+
tm_layout(main.title="Equal Interval Classification",
main.title.size = 0.8,
main.title.position = "center",
legend.height = 0.35,
legend.width = 0.25,
frame = FALSE)
plotlogWEE_j <-
tm_shape(origin_gridwgeom) +
tm_fill("logWEE",
style = "jenks",
palette = "Reds",
title = "Total passenger trips",
alpha=0.6,
id="LOC_DESC") +
tm_borders(alpha = 0.5)+
tm_layout(main.title="Jenks Classification",
main.title.size = 0.8,
main.title.position = "center",
legend.height = 0.35,
legend.width = 0.25,
frame = FALSE)
tmap_arrange(plotlogWEE_q, plotlogWEE_e, plotlogWEE_j, asp=1, ncol=3)
On weekend/holiday evenings, there seemed to be increased ridership at the bus stops near Changi Airport compared to the other peak periods as witnessed in the first plot using the quantile classification method. Sunshine Plaza remains one of the most frequented bus stops, exhibiting high ridership across all four peak periods. While the exact reason for this is difficult to pinpoint, it’s possible that the buses stopping here connect to a wide variety of regions, which could explain the high ridership. Woodlands Checkpoint also demonstrated high levels of ridership across the different peak periods.
Visually, it looks like there are more bus stops with high ridership across Singapore during the weekend/holiday evening peak period. For example, there seem to be an increase in passenger volume at the Tanah Merah Ferry compared to the other peak periods.
Using the equal interval classification method, the bus stops with higher traffic seem to be quite consistent across the different peak periods. This includes Woodlands Checkpoint, Kranji Station, Admiralty Station, Serangoon Station, Aft. Punggol Road, Bukit Panjang MRT, Yio Chu Kang Interchange.
The Jenk’s method seems to suggest that hot spots are mostly congregated in the core of Singapore, with cold spots along the periphery of Singapore.
While the choropleths were useful for quickly visualizing results, the different plots reveal that varying data classification methods can lead to different findings. To draw objective conclusions, we should supplement our initial findings with geospatial analysis.
5 Spatial Association Analysis
According to Tobler’s First Law of Geography, “everything is related to everything else, but near things are more related than distant things.”
This sub-section will cover the computation of local spatial autocorrelation statistics and spatial complete randomness test for local spatial autocorrelation. The goal of these analyses is to understand whether there are clusters or outliers of bus stop with high or low ridership across Singapore.
5.1 Spatial Weights Matrix
To compute the local spatial autocorrelation statistics, we first need to construct a spatial weights of Singapore. Spatial relationships are multi-directional and multi-lateral. We can use spatial weights to define who the neighbours of the spatial units.
There are two common methods of spatial weights: contiguity-based and distance-based.
Contiguity-based: Neighbours share a common boundary, which can be further distinguished between a Rook and a Queen criterion of contiguity. Rook contiguity defines neighbours by the existence of a common edge between two spatial units. In Queen contiguity defines neighbours as spatial units sharing a common edge or a common vertex.
Distance-based: Assign higher weights to pairs of locations that are closer to each other and lower weights to pairs that are further. This can be further distinguished by fixed weighting, adaptive weighting and inverse-distance weighting schemes. Fixed weighting scheme considers two regions are neighbours if they are within a specified distance from one another. For adaptive weighting scheme, each region will have the same number of neighbours. The number of neighbour is specified beforehand, where k = number of neighbours. Inverse distance method considers that the closer two features are in space, the more likely they are to interact/influence each other.
For this study, we will be using distance-based weight matrix as there are areas where bus stops are sparse (such as Lim Chu Kang and Mandai) and isolated (for example, Tanah Merah Ferry, Changi Naval Base, Resort World Sentosa, Marina Bay Cruise Centre). Consequently, contiguity-based matrix may yield many regions with no neighbours, making it not suitable for our analysis.
Step 1: Determine Cut-Off Distance Limit
First step is to determine the upper distance limit to ensure that each hexagon has at least 1 neighbour.
The following functions can be used:
- st_knn() of sfdep is used to identify neighbors based on k (e.g. k = 8 indicates the nearest eight neighbours). The output is a neighbours list of class nb. If polygon geometry is provided, the centroids of the polygon will be used for calculation.
- st_nb_dists() of sfdep is used to calculate the nearest neighbour distance. The output is a list of distances for each observation’s neighbors list.
- unlist() of Base R is then used to return the output as a vector so that the summary statistics of the nearest neighbour distances can be derived.
geo <- sf::st_geometry(origin_gridwgeom)
nb <- st_knn(geo,
longlat = TRUE)
dists <- unlist(st_nb_dists(geo, nb))Step 2: Derive Summary Stats
We can derive summary statistics of the nearest neighbour distances vector (i.e. dists) by using the code chunk below.
summary(dists) Min. 1st Qu. Median Mean 3rd Qu. Max.
500.0 500.0 500.0 503.4 500.0 2291.3
The maximum nearest neighbour distance is 2291.3m, thus we will use threshold value of 2292m to ensure each spatial unit has a minimum of 1 neighbour.
Step 3: Compute fixed distance weight
Now we will go ahead to compute the fixed distance weights by using following functions:
st_dists_band() of sfdep is used to identify neighbors based on a distance band (i.e. 1000m). The output is a list of neighbours (i.e. nb). st_weights() is then used to calculate polygon spatial weights of the nb list. Note that the default style argument is set to “W” for row standardized weights, and the default allow_zero is set to TRUE, assigns zero as lagged value to zone without neighbors.
wm_fd <- origin_gridwgeom %>%
mutate(nb = st_dist_band(geo,
upper = 2992),
wt = st_weights(nb),
.before = 1)Step 4: Observations
Show the code
summary(wm_fd$nb)Neighbour list object:
Number of regions: 1503
Number of nonzero links: 106746
Percentage nonzero weights: 4.725346
Average number of links: 71.02196
Link number distribution:
1 2 4 9 10 11 12 13 14 15 16 17 19 20 21 22 23 24 25 26
1 1 1 4 2 2 3 1 2 3 3 1 2 2 1 1 6 5 3 3
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
8 9 4 3 7 2 7 6 4 9 9 5 13 10 7 9 9 12 13 16
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
15 11 13 19 18 15 19 19 24 16 12 22 20 29 24 25 9 33 27 15
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86
20 24 21 24 26 23 24 20 30 19 20 24 23 22 32 29 20 23 26 33
87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106
29 30 31 31 25 24 22 23 18 27 18 26 20 13 17 12 16 6 14 7
107 108 109 110
5 6 3 3
1 least connected region:
1503 with 1 link
3 most connected regions:
1013 1024 1050 with 110 links
From the result above, we can confirm that all hexagons have at least one neighbour and there are 3 well-connected hexagons with 110 neighbours. We can also identify an average of 71 neighbours per hexagon using the distance-based weight matrix.
A characteristic of fixed distance weights matrix is that more densely settled areas (town, residential neighbourhoods) tend to have more neigbours while less densely settle areas (military camps, industrial estates) tend to have less neighbours. To overcome the issue of fixed distance weight matrix where there is uneven distribution of neighbours, we can directly control the number of neighbours using k-nearest neighbours by setting the value of k in the code chunk below.
As a rule-of-thumb, we will set k = 8 i.e., all hexagons will have 8 neighbours.
wm_ad <- origin_gridwgeom %>%
mutate(nb = st_knn(geo,
k=8),
wt = st_weights(nb),
.before = 1)
head(wm_ad, n=3)Simple feature collection with 3 features and 13 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 3720.122 ymin: 27925.48 xmax: 4720.122 ymax: 31100.9
Projected CRS: SVY21 / Singapore TM
nb
1 2, 4, 5, 8, 9, 12, 22, 23
2 1, 4, 5, 8, 9, 12, 22, 23
3 5, 6, 9, 10, 13, 14, 16, 17
wt grid_id busstops
1 0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125 34 1
2 0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125 65 1
3 0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125 99 1
LOC_DESC WDA WDM WEM WEE logWDM logWDA logWEM logWEE
1 AFT TUAS STH BLVD 417 62 5 65 4.127134 6.033086 1.609438 4.174387
2 BEF TUAS STH AVE 14 110 50 24 26 3.912023 4.700480 3.178054 3.258097
3 YONG NAM 249 44 27 54 3.784190 5.517453 3.295837 3.988984
area_hexagon_grid
1 POLYGON ((3970.122 27925.48...
2 POLYGON ((4220.122 28358.49...
3 POLYGON ((4470.122 30523.55...
The results show that the weights of the neighbours have been assigned to 1/8 (0.125) of the total weight, representing each of the 8 neighbours.
Inverse distance weights takes into account the decay functions of distance.
We can derive spatial weight matrix based on inverse distance method using the following functions:
- st_knn() of sfdep is used to identify neighbors based on k (e.g. k = 8 indicates the nearest eight neighbours). The output is a neighbours list of class nb. If polygon geometry is provided, the centroids of the polygon will be used for calculation.
- st_inverse_distance() is then used to calculate inverse distance weights of neighbours on the nb list.
For our analysis, we will set the number of neighbours to 8.
wm_idw <- origin_gridwgeom %>%
mutate(nb = st_knn(geo,
k=8),
wts = st_inverse_distance(nb, geo,
scale = 1,
alpha = 1),
.before = 1)
head(wm_idw, n=3)Simple feature collection with 3 features and 13 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 3720.122 ymin: 27925.48 xmax: 4720.122 ymax: 31100.9
Projected CRS: SVY21 / Singapore TM
nb
1 2, 4, 5, 8, 9, 12, 22, 23
2 1, 4, 5, 8, 9, 12, 22, 23
3 5, 6, 9, 10, 13, 14, 16, 17
wts
1 0.0020000000, 0.0011547005, 0.0004364358, 0.0007559289, 0.0005000000, 0.0005547002, 0.0005547002, 0.0004588315
2 0.0020000000, 0.0020000000, 0.0005547002, 0.0011547005, 0.0006666667, 0.0007559289, 0.0006666667, 0.0005773503
3 0.0020000000, 0.0020000000, 0.0010000000, 0.0020000000, 0.0011547005, 0.0011547005, 0.0007559289, 0.0010000000
grid_id busstops LOC_DESC WDA WDM WEM WEE logWDM logWDA
1 34 1 AFT TUAS STH BLVD 417 62 5 65 4.127134 6.033086
2 65 1 BEF TUAS STH AVE 14 110 50 24 26 3.912023 4.700480
3 99 1 YONG NAM 249 44 27 54 3.784190 5.517453
logWEM logWEE area_hexagon_grid
1 1.609438 4.174387 POLYGON ((3970.122 27925.48...
2 3.178054 3.258097 POLYGON ((4220.122 28358.49...
3 3.295837 3.988984 POLYGON ((4470.122 30523.55...
The inverse distance approach is best suited for continuous data or for modeling scenarios where spatial proximity increases the likelihood of interaction or influence between two features. Nevertheless, this method treats every feature as a potential neighbor to every other feature, which can lead to a significant computational burden, especially in the case of large datasets where the volume of calculations required becomes substantial.
In summary:
- The number of neighbours using fixed distance method vary widely from 1 to 102. Consequently, the uneven distribution could affect the spatial autocorrelation analysis.
- Inverse distance method is computationally intensive as each feature is potentially a neighbour of every other feature.
- Since each hexagon is equally sized, the adaptive distance-based spatial weight matrix would be best suited for our analysis since each centroid can represent each region well.
5.2 Global Spatial Autocorrelation Statistics
We will perform Moran’s I statistical testing by using global_moran_perm() of spdep. The Global Moran’s I Permutation Test is a statistical method used in spatial analysis to assess the significance of spatial autocorrelation in a dataset. Spatial autocorrelation refers to the degree to which a variable is correlated with itself across space, indicating patterns such as clustering or dispersion.
The Moran I statistic ranges from -1 to 1. If the Moran I is:
- positive (I>0): Clustered, observations tend to be similar
- negative (I<0): Disperse, observations tend to be dissimilar
- approximately zero: observations arranged randomly over space
The code chunk below performs Moran’s I statistical testing, using the null and alternative hypotheses as follows:
H0: The observed spatial patterns of proportion of bus ridership in Singapore are not clustered (i.e. either random or dispersed). H1: The observed spatial patterns of proportion of bus ridership in Singapore are clustered.
A total of 100 simulations will be performed using the original and logged values with a seed number 1234. set.seed() function allows us to create reproducible results.
set.seed(1234)Original Data
Show the code
gmp_WDM <- global_moran_perm(wm_ad$WDM,
wm_ad$nb,
wm_ad$wt,
nsim = 99)
gmp_WDM
Monte-Carlo simulation of Moran I
data: x
weights: listw
number of simulations + 1: 100
statistic = 0.19938, observed rank = 100, p-value < 2.2e-16
alternative hypothesis: two.sided
Log-Transformed Variable
Show the code
gmp_logWDM <- global_moran_perm(wm_ad$logWDM,
wm_ad$nb,
wm_ad$wt,
nsim = 99)
gmp_logWDM
Monte-Carlo simulation of Moran I
data: x
weights: listw
number of simulations + 1: 100
statistic = 0.5159, observed rank = 100, p-value < 2.2e-16
alternative hypothesis: two.sided
Original Data
Show the code
gmp_WDA <- global_moran_perm(wm_ad$WDA,
wm_ad$nb,
wm_ad$wt,
nsim = 99)
gmp_WDA
Monte-Carlo simulation of Moran I
data: x
weights: listw
number of simulations + 1: 100
statistic = 0.059528, observed rank = 100, p-value < 2.2e-16
alternative hypothesis: two.sided
Log-Transformed Data
Show the code
gmp_logWDA <- global_moran_perm(wm_ad$logWDA,
wm_ad$nb,
wm_ad$wt,
nsim = 99)
gmp_logWDA
Monte-Carlo simulation of Moran I
data: x
weights: listw
number of simulations + 1: 100
statistic = 0.35711, observed rank = 100, p-value < 2.2e-16
alternative hypothesis: two.sided
Original Data
Show the code
gmp_WEM <- global_moran_perm(wm_ad$WEM,
wm_ad$nb,
wm_ad$wt,
nsim = 99)
gmp_WEM
Monte-Carlo simulation of Moran I
data: x
weights: listw
number of simulations + 1: 100
statistic = 0.15158, observed rank = 100, p-value < 2.2e-16
alternative hypothesis: two.sided
Log-Transformed Data
Show the code
gmp_logWEM <- global_moran_perm(wm_ad$logWEM,
wm_ad$nb,
wm_ad$wt,
nsim = 99)
gmp_logWEM
Monte-Carlo simulation of Moran I
data: x
weights: listw
number of simulations + 1: 100
statistic = 0.49385, observed rank = 100, p-value < 2.2e-16
alternative hypothesis: two.sided
Original Data
Show the code
gmp_WEE <- global_moran_perm(wm_ad$WEE,
wm_ad$nb,
wm_ad$wt,
nsim = 99)
gmp_WEE
Monte-Carlo simulation of Moran I
data: x
weights: listw
number of simulations + 1: 100
statistic = 0.097321, observed rank = 100, p-value < 2.2e-16
alternative hypothesis: two.sided
Log-Transformed Data
Show the code
gmp_logWEE <- global_moran_perm(wm_ad$logWEE,
wm_ad$nb,
wm_ad$wt,
nsim = 99)
gmp_logWEE
Monte-Carlo simulation of Moran I
data: x
weights: listw
number of simulations + 1: 100
statistic = 0.40407, observed rank = 100, p-value < 2.2e-16
alternative hypothesis: two.sided
Across the 4 peak periods, the permutation test generated low p-values of <0.05. This indicates that we can reject the null hypothesis at the 95% level of confidence, and conclude that for each of the 4 peak periods, the overall bus ridership across Singapore is spatially clustered (since positive Moran’s I value is obtained). The higher Moran’s I values for logged-transformed variables suggests a higher level of clustering compared to the original ridership values.
5.3 Local Spatial Autocorrelation Statistics
Global spatial autocorrelation provides a broad overview of spatial clustering within a dataset, offering a single value that indicates whether similar values are generally clustered or dispersed across the entire study area. In contrast, local spatial autocorrelation delves into specific locations, identifying where clusters of similar values (hot spots or cold spots) or spatial outliers exist. While global metrics give an overall trend, local metrics provide detailed, location-specific insights, highlighting exact areas of significant spatial clustering or anomaly.
Thus, after we have established through statistical testing that spatial clustering of bus ridership occurs in Singapore, we now seek to detect clusters or outliers and discover if there are any hot or cold spots of high ridership using Local Spatial Autocorrelation Statistics.
5.3.1 Local Moran’s I
In this section, we will perform Moran’s I statistics testing by using local_moran() of sfdep. The output of local_moran() is a sf data.frame, containing the columns below:
- ii: local moran statistic
- eii: expectation of local Moran’s I statistic
- var_ii: variance of local Moran’s I statistic
- z_ii: standard deviation of local Moran’s I statistic
- p_ii: p-value of local Moran’s I statistic using pnorm()
- p_ii_sim: For
localmoran_perm(),rank()andpunif()of observed statistic rank for [0, 1] p-values usingalternative= - p_folded_sim: the simulation folded [0, 0.5] range ranked p-value, based on crand.py of pysal
- skewness: For
localmoran_perm, the output of e1071::skewness() for the permutation samples underlying the standard deviates - kurtosis: For
localmoran_perm, the output of e1071::kurtosis() for the permutation samples underlying the standard deviates.
unnest() of tidyr package helps expand a list-column containing data frames into rows and columns.
Original Data
Show the code
lisa_WDM <- wm_ad %>%
mutate(local_moran = local_moran(
origin_gridwgeom$WDM, nb, wt, nsim = 99),
.before = 1) %>%
unnest(local_moran)
head(lisa_WDM, n=5)Simple feature collection with 5 features and 25 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 3720.122 ymin: 27925.48 xmax: 4970.122 ymax: 31100.9
Projected CRS: SVY21 / Singapore TM
# A tibble: 5 × 26
ii eii var_ii z_ii p_ii p_ii_sim p_folded_sim skewness kurtosis
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.386 -0.0300 0.0517 1.83 0.0673 0.02 0.01 -0.986 1.02
2 0.386 0.00824 0.0439 1.80 0.0715 0.02 0.01 -1.59 3.83
3 0.384 0.0181 0.0392 1.85 0.0646 0.02 0.01 -1.25 2.84
4 0.382 -0.00282 0.0440 1.83 0.0669 0.02 0.01 -1.33 2.74
5 0.367 0.0196 0.0363 1.82 0.0688 0.02 0.01 -0.911 0.458
# ℹ 17 more variables: mean <fct>, median <fct>, pysal <fct>, nb <nb>,
# wt <list>, grid_id <int>, busstops <int>, LOC_DESC <chr>, WDA <dbl>,
# WDM <dbl>, WEM <dbl>, WEE <dbl>, logWDM <dbl>, logWDA <dbl>, logWEM <dbl>,
# logWEE <dbl>, area_hexagon_grid <POLYGON [m]>
Log-Transformed Data
Show the code
lisa_logWDM <- wm_ad %>%
mutate(local_moran = local_moran(
origin_gridwgeom$logWDM, nb, wt, nsim = 99),
.before = 1) %>%
unnest(local_moran)
head(lisa_logWDM, n=5)Simple feature collection with 5 features and 25 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 3720.122 ymin: 27925.48 xmax: 4970.122 ymax: 31100.9
Projected CRS: SVY21 / Singapore TM
# A tibble: 5 × 26
ii eii var_ii z_ii p_ii p_ii_sim p_folded_sim skewness kurtosis
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2.50 0.0725 0.395 3.85 0.000116 0.02 0.01 0.164 -0.366
2 2.60 -0.00763 0.371 4.28 0.0000186 0.02 0.01 0.152 0.00836
3 2.43 0.00367 0.499 3.43 0.000596 0.02 0.01 0.451 0.187
4 1.98 0.00665 0.283 3.70 0.000212 0.02 0.01 0.142 -0.585
5 0.846 -0.0305 0.0363 4.60 0.00000421 0.02 0.01 0.266 -0.274
# ℹ 17 more variables: mean <fct>, median <fct>, pysal <fct>, nb <nb>,
# wt <list>, grid_id <int>, busstops <int>, LOC_DESC <chr>, WDA <dbl>,
# WDM <dbl>, WEM <dbl>, WEE <dbl>, logWDM <dbl>, logWDA <dbl>, logWEM <dbl>,
# logWEE <dbl>, area_hexagon_grid <POLYGON [m]>
Original Data
Show the code
lisa_WDA <- wm_ad %>%
mutate(local_moran = local_moran(
origin_gridwgeom$WDA, nb, wt, nsim = 99),
.before = 1) %>%
unnest(local_moran)
head(lisa_WDA, n=5)Simple feature collection with 5 features and 25 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 3720.122 ymin: 27925.48 xmax: 4970.122 ymax: 31100.9
Projected CRS: SVY21 / Singapore TM
# A tibble: 5 × 26
ii eii var_ii z_ii p_ii p_ii_sim p_folded_sim skewness kurtosis
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.205 -0.00517 0.0243 1.35 0.177 0.02 0.01 -2.11 5.11
2 0.209 -0.0201 0.0338 1.25 0.213 0.02 0.01 -1.96 3.82
3 0.204 0.0222 0.0211 1.25 0.211 0.02 0.01 -2.54 9.59
4 0.192 0.0250 0.0136 1.43 0.152 0.02 0.01 -2.33 6.51
5 0.177 -0.00590 0.0164 1.43 0.153 0.02 0.01 -1.78 4.07
# ℹ 17 more variables: mean <fct>, median <fct>, pysal <fct>, nb <nb>,
# wt <list>, grid_id <int>, busstops <int>, LOC_DESC <chr>, WDA <dbl>,
# WDM <dbl>, WEM <dbl>, WEE <dbl>, logWDM <dbl>, logWDA <dbl>, logWEM <dbl>,
# logWEE <dbl>, area_hexagon_grid <POLYGON [m]>
Log-Transformed Data
Show the code
lisa_logWDA <- wm_ad %>%
mutate(local_moran = local_moran(
origin_gridwgeom$logWDA, nb, wt, nsim = 99),
.before = 1) %>%
unnest(local_moran)
head(lisa_logWDA, n=5)Simple feature collection with 5 features and 25 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 3720.122 ymin: 27925.48 xmax: 4970.122 ymax: 31100.9
Projected CRS: SVY21 / Singapore TM
# A tibble: 5 × 26
ii eii var_ii z_ii p_ii p_ii_sim p_folded_sim skewness kurtosis
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1.90 0.0644 0.303 3.34 0.000848 0.04 0.02 0.849 0.770
2 2.67 -0.0583 0.537 3.73 0.000194 0.02 0.01 -0.153 -0.661
3 2.10 0.00427 0.375 3.42 0.000619 0.06 0.03 0.946 1.62
4 0.924 -0.0318 0.0420 4.66 0.00000311 0.02 0.01 0.581 0.695
5 0.523 0.00710 0.0151 4.21 0.0000258 0.02 0.01 0.343 -0.534
# ℹ 17 more variables: mean <fct>, median <fct>, pysal <fct>, nb <nb>,
# wt <list>, grid_id <int>, busstops <int>, LOC_DESC <chr>, WDA <dbl>,
# WDM <dbl>, WEM <dbl>, WEE <dbl>, logWDM <dbl>, logWDA <dbl>, logWEM <dbl>,
# logWEE <dbl>, area_hexagon_grid <POLYGON [m]>
Original Data
Show the code
lisa_WEM <- wm_ad %>%
mutate(local_moran = local_moran(
origin_gridwgeom$WEM, nb, wt, nsim = 99),
.before = 1) %>%
unnest(local_moran)
head(lisa_WEM, n=5)Simple feature collection with 5 features and 25 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 3720.122 ymin: 27925.48 xmax: 4970.122 ymax: 31100.9
Projected CRS: SVY21 / Singapore TM
# A tibble: 5 × 26
ii eii var_ii z_ii p_ii p_ii_sim p_folded_sim skewness kurtosis
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.315 0.00993 0.0417 1.49 0.135 0.02 0.01 -1.98 5.32
2 0.314 0.0264 0.0334 1.57 0.116 0.02 0.01 -1.77 4.00
3 0.307 0.0271 0.0316 1.57 0.116 0.02 0.01 -1.72 4.44
4 0.303 0.0129 0.0262 1.80 0.0725 0.02 0.01 -1.30 1.87
5 0.290 -0.0248 0.0488 1.43 0.154 0.02 0.01 -1.93 3.92
# ℹ 17 more variables: mean <fct>, median <fct>, pysal <fct>, nb <nb>,
# wt <list>, grid_id <int>, busstops <int>, LOC_DESC <chr>, WDA <dbl>,
# WDM <dbl>, WEM <dbl>, WEE <dbl>, logWDM <dbl>, logWDA <dbl>, logWEM <dbl>,
# logWEE <dbl>, area_hexagon_grid <POLYGON [m]>
Log-Transformed Data
Show the code
lisa_logWEM <- wm_ad %>%
mutate(local_moran = local_moran(
origin_gridwgeom$logWEM, nb, wt, nsim = 99),
.before = 1) %>%
unnest(local_moran)
head(lisa_logWEM, n=5)Simple feature collection with 5 features and 25 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 3720.122 ymin: 27925.48 xmax: 4970.122 ymax: 31100.9
Projected CRS: SVY21 / Singapore TM
# A tibble: 5 × 26
ii eii var_ii z_ii p_ii p_ii_sim p_folded_sim skewness kurtosis
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 4.21 0.111 1.35 3.53 0.000413 0.02 0.01 0.224 -0.322
2 3.27 0.0112 0.693 3.91 0.0000919 0.02 0.01 0.400 0.614
3 2.64 0.0295 0.453 3.88 0.000103 0.02 0.01 0.342 -0.542
4 1.92 -0.0367 0.153 5.00 0.000000568 0.02 0.01 0.168 -0.344
5 0.903 -0.0294 0.0373 4.83 0.00000135 0.02 0.01 0.181 -0.155
# ℹ 17 more variables: mean <fct>, median <fct>, pysal <fct>, nb <nb>,
# wt <list>, grid_id <int>, busstops <int>, LOC_DESC <chr>, WDA <dbl>,
# WDM <dbl>, WEM <dbl>, WEE <dbl>, logWDM <dbl>, logWDA <dbl>, logWEM <dbl>,
# logWEE <dbl>, area_hexagon_grid <POLYGON [m]>
Original Data
Show the code
lisa_WEE <- wm_ad %>%
mutate(local_moran = local_moran(
origin_gridwgeom$WEE, nb, wt, nsim = 99),
.before = 1) %>%
unnest(local_moran)
head(lisa_WEE, n=5)Simple feature collection with 5 features and 25 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 3720.122 ymin: 27925.48 xmax: 4970.122 ymax: 31100.9
Projected CRS: SVY21 / Singapore TM
# A tibble: 5 × 26
ii eii var_ii z_ii p_ii p_ii_sim p_folded_sim skewness kurtosis
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.223 -0.00162 0.0255 1.41 0.159 0.02 0.01 -2.01 6.38
2 0.225 -0.0145 0.0384 1.22 0.222 0.02 0.01 -2.25 5.82
3 0.216 0.00855 0.0216 1.41 0.157 0.02 0.01 -1.70 3.01
4 0.213 -0.00814 0.0345 1.19 0.233 0.02 0.01 -2.88 11.0
5 0.200 -0.00909 0.0319 1.17 0.241 0.02 0.01 -2.02 3.82
# ℹ 17 more variables: mean <fct>, median <fct>, pysal <fct>, nb <nb>,
# wt <list>, grid_id <int>, busstops <int>, LOC_DESC <chr>, WDA <dbl>,
# WDM <dbl>, WEM <dbl>, WEE <dbl>, logWDM <dbl>, logWDA <dbl>, logWEM <dbl>,
# logWEE <dbl>, area_hexagon_grid <POLYGON [m]>
Log-Transformed Data
Show the code
lisa_logWEE <- wm_ad %>%
mutate(local_moran = local_moran(
origin_gridwgeom$logWEE, nb, wt, nsim = 99),
.before = 1) %>%
unnest(local_moran)
head(lisa_logWEE, n=5)Simple feature collection with 5 features and 25 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 3720.122 ymin: 27925.48 xmax: 4970.122 ymax: 31100.9
Projected CRS: SVY21 / Singapore TM
# A tibble: 5 × 26
ii eii var_ii z_ii p_ii p_ii_sim p_folded_sim skewness kurtosis
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2.17 -0.00124 0.385 3.50 0.000466 0.02 0.01 0.174 0.342
2 2.67 0.0907 0.638 3.22 0.00127 0.02 0.01 -0.0207 0.00460
3 2.02 -0.156 0.419 3.36 0.000767 0.04 0.02 0.577 0.605
4 1.20 0.0287 0.103 3.64 0.000277 0.02 0.01 0.336 -0.0657
5 0.675 -0.00449 0.0309 3.87 0.000110 0.02 0.01 0.788 0.969
# ℹ 17 more variables: mean <fct>, median <fct>, pysal <fct>, nb <nb>,
# wt <list>, grid_id <int>, busstops <int>, LOC_DESC <chr>, WDA <dbl>,
# WDM <dbl>, WEM <dbl>, WEE <dbl>, logWDM <dbl>, logWDA <dbl>, logWEM <dbl>,
# logWEE <dbl>, area_hexagon_grid <POLYGON [m]>
5.3.2 Visualising Local Moran’s I & p-value
To better understand which areas are outliers/clusters, we will use choropleth mapping functions of tmap package to visualise the local Moran’s I values and the associated p-values by using the code chunks below.
Show the code
tmap_mode("plot")
plot_lisaWDM <-
tm_shape(lisa_WDM) +
tm_fill("ii",
style="pretty",
palette="RdBu") +
tm_view(set.zoom.limits = c(6,8)) +
tm_layout(main.title = "Local Moran's I",
main.title.size = 0.7,
main.title.position = "center",
legend.title.size = 1,
legend.text.size = 0.5,
legend.position = c("left","top"),
frame = FALSE)
plotp_lisaWDM <-
tm_shape(lisa_WDM) +
tm_fill("p_ii_sim",
breaks = c(0, 0.001, 0.01, 0.05, 1),
palette="-Blues",
labels = c("0.001", "0.01", "0.05", "Not sig")) +
tm_view(set.zoom.limits = c(6,8)) +
tm_layout(main.title = "p-value of Local Moran's I",
main.title.size = 0.7,
main.title.position = "center",
legend.title.size = 1,
legend.text.size = 0.5,
legend.position = c("left","top"),
frame = FALSE)
plot_lisalogWDM <-
tm_shape(lisa_logWDM) +
tm_fill("ii",
style="pretty",
palette="RdBu") +
tm_view(set.zoom.limits = c(6,8)) +
tm_layout(main.title = "Local Moran's I\n (Using Log-Transformed Variables)",
main.title.size = 0.7,
main.title.position = "center",
legend.title.size = 1,
legend.text.size = 0.5,
legend.position = c("left","top"),
frame = FALSE
)
plotp_lisalogWDM <-
tm_shape(lisa_logWDM) +
tm_fill("p_ii_sim",
breaks = c(0, 0.001, 0.01, 0.05, 1),
palette="-Blues",
labels = c("0.001", "0.01", "0.05", "Not sig")) +
tm_view(set.zoom.limits = c(6,8)) +
tm_layout(main.title = "p-value of Local Moran's I\n (Using Log-Transformed Variables)",
main.title.size = 0.7,
main.title.position = "center",
legend.title.size = 1,
legend.text.size = 0.5,
legend.position = c("left","top"),
frame = FALSE)
tmap_arrange(plot_lisaWDM, plotp_lisaWDM, plot_lisalogWDM, plotp_lisalogWDM, asp=2, ncol=2, nrow = 2) 
Show the code
tmap_mode("plot")
plot_lisaWDA <-
tm_shape(lisa_WDA) +
tm_fill("ii",
style="pretty",
palette="RdBu") +
tm_view(set.zoom.limits = c(6,8)) +
tm_layout(main.title = "Local Moran's I",
main.title.size = 0.7,
main.title.position = "center",
legend.title.size = 1,
legend.text.size = 0.5,
legend.position = c("left","top"),
frame = FALSE
)
plotp_lisaWDA <-
tm_shape(lisa_WDA) +
tm_fill("p_ii_sim",
breaks = c(0, 0.001, 0.01, 0.05, 1),
palette="-Blues",
labels = c(-Inf, "0.01", "0.05", "Not sig")) +
tm_view(set.zoom.limits = c(6,8)) +
tm_layout(main.title = "p-value of Local Moran's I",
main.title.size = 0.7,
main.title.position = "center",
legend.title.size = 1,
legend.text.size = 0.5,
legend.position = c("left","top"),
frame = FALSE
)
plot_lisalogWDA <-
tm_shape(lisa_logWDA) +
tm_fill("ii",
style="pretty",
palette="RdBu") +
tm_view(set.zoom.limits = c(6,8)) +
tm_layout(main.title = "Local Moran's I\n (Using Log-Transformed Variables)",
main.title.size = 0.7,
main.title.position = "center",
legend.title.size = 1,
legend.text.size = 0.5,
legend.position = c("left","top"),
frame = FALSE
)
plotp_lisalogWDA <-
tm_shape(lisa_logWDA) +
tm_fill("p_ii_sim",
breaks = c(0, 0.001, 0.01, 0.05, 1),
palette="-Blues",
labels = c("0.001", "0.01", "0.05", "Not sig")) +
tm_view(set.zoom.limits = c(6,8)) +
tm_layout(main.title = "p-value of Local Moran's I\n (Using Log-Transformed Variables)",
main.title.size = 0.7,
main.title.position = "center",
legend.title.size = 1,
legend.text.size = 0.5,
legend.position = c("left","top"),
frame = FALSE
)
tmap_arrange(plot_lisaWDA, plotp_lisaWDA, plot_lisalogWDA, plotp_lisalogWDA, asp=2, ncol=2)
Show the code
tmap_mode("plot")
plot_lisaWEM <-
tm_shape(lisa_WEM) +
tm_fill("ii",
style="pretty",
palette="RdBu") +
tm_view(set.zoom.limits = c(6,8)) +
tm_layout(main.title = "Local Moran's I",
main.title.size = 0.7,
main.title.position = "center",
legend.title.size = 1,
legend.text.size = 0.5,
legend.position = c("left","top"),
frame = FALSE
)
plotp_lisaWEM <-
tm_shape(lisa_WEM) +
tm_fill("p_ii_sim",
breaks = c(0, 0.001, 0.01, 0.05, 1),
palette="-Blues",
labels = c(-Inf, "0.01", "0.05", "Not sig")) +
tm_view(set.zoom.limits = c(6,8)) +
tm_layout(main.title = "p-value of Local Moran's I",
main.title.size = 0.7,
main.title.position = "center",
legend.title.size = 1,
legend.text.size = 0.5,
legend.position = c("left","top"),
frame = FALSE
)
plot_lisalogWEM <-
tm_shape(lisa_logWEM) +
tm_fill("ii",
style="pretty",
palette="RdBu") +
tm_view(set.zoom.limits = c(6,8)) +
tm_layout(main.title = "Local Moran's I\n (Using Log-Transformed Variables)",
main.title.size = 0.7,
main.title.position = "center",
legend.title.size = 1,
legend.text.size = 0.5,
legend.position = c("left","top"),
frame = FALSE
)
plotp_lisalogWEM <-
tm_shape(lisa_logWEM) +
tm_fill("p_ii_sim",
breaks = c(0, 0.001, 0.01, 0.05, 1),
palette="-Blues",
labels = c("0.001", "0.01", "0.05", "Not sig")) +
tm_view(set.zoom.limits = c(6,8)) +
tm_layout(main.title = "p-value of Local Moran's I\n (Using Log-Transformed Variables)",
main.title.size = 0.7,
main.title.position = "center",
legend.title.size = 1,
legend.text.size = 0.5,
legend.position = c("left","top"),
frame = FALSE
)
tmap_arrange(plot_lisaWEM, plotp_lisaWEM, plot_lisalogWEM, plotp_lisalogWEM, asp=2, ncol=2, nrow = 2)
Show the code
tmap_mode("plot")
plot_lisaWEE <-
tm_shape(lisa_WEE) +
tm_fill("ii",
style="pretty",
palette="RdBu") +
#tm_borders(alpha = 0.5) +
tm_view(set.zoom.limits = c(6,8)) +
tm_layout(main.title = "Local Moran's I",
main.title.size = 0.7,
main.title.position = "center",
legend.title.size = 1,
legend.text.size = 0.5,
legend.position = c("left","top"),
frame = FALSE)
plotp_lisaWEE <-
tm_shape(lisa_WEE) +
tm_fill("p_ii_sim",
breaks = c(0, 0.001, 0.01, 0.05, 1),
palette="-Blues",
labels = c(-Inf, "0.01", "0.05", "Not sig")) +
#tm_borders(alpha = 0.5) +
tm_view(set.zoom.limits = c(6,8)) +
tm_layout(main.title = "p-value of Local Moran's I",
main.title.size = 0.7,
main.title.position = "center",
legend.title.size = 1,
legend.text.size = 0.5,
legend.position = c("left","top"),
frame = FALSE)
plot_lisalogWEE <-
tm_shape(lisa_logWEE) +
tm_fill("ii",
style="pretty",
palette="RdBu") +
# tm_borders(alpha = 0.5) +
tm_view(set.zoom.limits = c(6,8)) +
tm_layout(main.title = "Local Moran's I\n (Using Log-Transformed Variables)",
main.title.size = 0.7,
main.title.position = "center",
legend.title.size = 1,
legend.text.size = 0.5,
legend.position = c("left","top"),
frame = FALSE
)
plotp_lisalogWEE <-
tm_shape(lisa_logWEE) +
tm_fill("p_ii_sim",
breaks = c(0, 0.001, 0.01, 0.05, 1),
palette="-Blues",
labels = c("0.001", "0.01", "0.05", "Not sig")) +
#tm_borders(alpha = 0.5) +
tm_view(set.zoom.limits = c(6,8)) +
tm_layout(main.title = "p-value of Local Moran's I\n (Using Log-Transformed Variables)",
main.title.size = 0.7,
main.title.position = "center",
legend.title.size = 1,
legend.text.size = 0.5,
legend.position = c("left","top"),
frame = FALSE
)
tmap_arrange(plot_lisaWEE, plotp_lisaWEE, plot_lisalogWEE, plotp_lisalogWEE, asp=2, ncol=2, nrow=2)
The plot on the top-left shows regions of positive (blue) and negative (orange) Moran’s I statistics, indicating positive and negative clustering relationships. In particular, bus stops near transit hubs like Woodlands Checkpoint, Clementi Station, and Punggol Station have high Local Moran’s I statistics.
On the top-right, we see that the p-values are significant at 5% significance level for the regions in the 3 darker shades of blue. We see that areas in the far West, North-West, and far East have statistically significant p-values for the cluster/outlier spatial relationship that they see.
We obtain different insights using the log-transformed variables to calculate Local Moran’s I. On weekdays, North-West region, particularly around Lim Chu Kang / Sungei Buloh. The plot on the right suggests that with p-value <0.05, the high spatial clustering observed in the area is also statistically significant.
LISA cluster maps combines both Local Moran’s I values and p-values to derive more insightful observations.
5.3.3 Visualising LISA Map
The LISA Cluster Map allows us to identify hotspots can its statistical significance. It is colour-coded by four types of spatial autocorrelation:
- High-High: Positive autocorrelation, i.e. clusters (indicates hexagon and its neighbours all have high ridership)
- Low-Low: Positive autocorrelation, i.e. clusters (indicates hexagon and its neighbours all have low number ridership)
- Low-High: Negative autocorrelation, i.e. outlier (indicates hexagon with low ridership among neighbours with high ridership)
- High-Low: Negative autocorrelation, i.e. outlier (indicates hexagon with high ridership among neighbours with low ridership)
The following functions are used:
filter() from dplyr used to retain records with statistically significant results (p-value < 0.05)
tmap_mode(“view”) allows for interactive viewing of plots
Note: The median is used for untransformed variables due to their right-skewed distribution, while the mean is used for log-transformed variables.
Weekday Morning
Show the code
lisa_sig_WDM <- lisa_WDM %>%
filter(p_ii_sim < 0.05)
tmap_mode("view")
tm_basemap("OpenStreetMap") +
tm_shape(lisa_WDM) +
tm_fill(alpha=0.6)+
tm_polygons() +
tm_shape(lisa_sig_WDM) +
tm_fill("median",
alpha=0.6,
id="LOC_DESC") +
tm_borders(alpha = 0.4)Show the code
lisa_sig_logWDM <-lisa_logWDM %>%
filter(p_ii_sim < 0.05)
tmap_mode("view")
tm_basemap("OpenStreetMap") +
tm_shape(lisa_logWDM) +
tm_fill(alpha=0.6)+
tm_polygons() +
tm_shape(lisa_sig_logWDM) +
tm_fill("mean",
alpha=0.6,
id="LOC_DESC") +
tm_borders(alpha = 0.4)The plot demonstrates that the statistically significant Moran’s I values are depicted in salmon for high-high autocorrelation and in teal for low-low autocorrelation, indicating clusters. Areas displaying low-low autocorrelation are primarily observed around the edges of the island, notably in the West, North-West, and East regions. In the context of our study, these areas experience low bus ridership and are surrounded by bus stops that also have low ridership. This pattern could suggest that on weekday mornings, these areas either have less human traffic or that residents prefer other forms of transportation. More and larger clusters are also seen using the log-transformed variables, where similar neighbouring hexagons demonstrate similar autocorrelation.
High-high autocorrelation areas, where both an area and its neighboring areas have high bus ridership, are scattered across the island. These are predominantly found near transit hubs such as Boon Lay Station, Aft. Bukit Panjang Station, Woodlands Checkpoint, and Buona Vista. Some residential estates and the neighourhood schools, including Punggol and Clementi, also exhibit high-high autocorrelation. This trend could be attributed to people traveling from their homes or switching transportation modes, opting for buses to reach their destinations. Retail centres appear to have high-high autocorrelation during the weekday morning peak-periods as well, evidenced in areas like Teck Whye Shopping Centre.
Several outliers are identified in purple and yellow. The Low-High regions (in purple) are generally found in proximity to high-high regions (in salmon). Conversely, High-Low regions (in yellow) appear to be located near low-low regions (in teal).
Weekday Afternoons
Show the code
lisa_sig_WDA <- lisa_WDA %>%
filter(p_ii_sim < 0.05)
tmap_mode("view")
tm_basemap("OpenStreetMap") +
tm_shape(lisa_WDA) +
tm_fill(alpha=0.6)+
tm_polygons() +
tm_shape(lisa_sig_WDA) +
tm_fill("median",
alpha=0.6,
id="LOC_DESC") +
tm_borders(alpha = 0.4)Show the code
lisa_sig_logWDA <- lisa_logWDA %>%
filter(p_ii_sim < 0.05)
tmap_mode("view")
tm_basemap("OpenStreetMap") +
tm_shape(lisa_logWDA) +
tm_fill(alpha=0.6)+
tm_polygons() +
tm_shape(lisa_sig_logWDA) +
tm_fill("mean",
alpha=0.6,
id="LOC_DESC") +
tm_borders(alpha = 0.4)For the weekday afternoon peak period, areas exhibiting high-high autocorrelation include transit hubs such as Opp. LakeSide Station, Bef. Khatib Station, Compassvale Station, Serangoon Station, Downtown Station, and others. This trend mirrors that of weekday mornings, where transit hubs experience high bus ridership, likely due to passengers switching between different modes of transportation. Bus stop near area of attraction (like the Singapore Zoo) also showed high-high autocorrelation on weekday afternoons, compared to the weekday morning peak period.
Using the log-transformed variable, the stretch along Orchard Road from Orchard Station to War Memorial Park, and CBD area in the South have also been highlighted as a high-high cluster.
During these afternoon peak periods, there appears to be less statistically significant autocorrelation, particularly in areas like the Loyang region, including Changi Prison and Laguna National Country Club. Notably, using the untransformed values, the Woodlands Checkpoint area does not demonstrate the statistically significant high-high relationship expected from the previous Section 4.2.2 Geovisualisation or in comparison to weekday mornings. This observation might indicate that there is more human traffic moving from Malaysia to Singapore in the mornings, resulting in higher bus ridership originating from the checkpoints.
Weekend/Holidays Morning
Show the code
lisa_sig_WEM <- lisa_WEM %>%
filter(p_ii_sim < 0.05)
tmap_mode("view")
tm_basemap("OpenStreetMap") +
tm_shape(lisa_WEM) +
tm_fill(alpha=0.6)+
tm_polygons() +
tm_shape(lisa_sig_WEM) +
tm_fill("median",
alpha=0.6,
id="LOC_DESC") +
tm_borders(alpha = 0.4)Show the code
lisa_sig_logWEM <- lisa_logWEM %>%
filter(p_ii_sim < 0.05)
tmap_mode("view")
tm_basemap("OpenStreetMap") +
tm_shape(lisa_logWEM) +
tm_fill(alpha=0.6)+
tm_polygons() +
tm_shape(lisa_sig_logWEM) +
tm_fill("mean",
alpha=0.6,
id="LOC_DESC") +
tm_borders(alpha = 0.4)There are a lot more low-low regions on weekend/holiday mornings, suggesting generally lower levels of ridership compared to the weekday peak periods. These clusters were noted along the peripheral edges. This is in line with the observations made in our barplot in Section 3.2.1. Another difference with the weekday peak periods is the low-low autocorrelation noted at the Science Park area evidenced by the cluster of teal hexagons in the South-West region.
There are also notably less high-high clusters in the West, specifically in the Jurong West / Boon Lay areas, and more of such clusters in the South-East region (i.e. Bras Basah and Marine Parade). There is also seemingly an influx of passengers from the Tanah Merah Ferry Terminal on weekend/holiday mornings, in contrast to the low-low autocorrelation for weekday-mornings.
Compared to the high-high autocorrelation noted during weekday afternoon peak period, Singapore Zoo showed a High-Low autocorrelation on weekend mornings instead, suggesting that it’s ridership remains high but lower ridership recorded in the surrounding regions.
Weekend/Holidays Evening
Show the code
lisa_sig_WEE <- lisa_WEE %>%
filter(p_ii_sim < 0.05)
tmap_mode("view")
tm_basemap("OpenStreetMap") +
tm_shape(lisa_WEE) +
tm_fill(alpha=0.6)+
tm_polygons() +
# tm_borders(alpha = 0.5) +
tm_shape(lisa_sig_WEE) +
tm_fill("median",
alpha=0.6,
id="LOC_DESC") +
tm_borders(alpha = 0.4)Show the code
lisa_sig_logWEE <- lisa_logWEE %>%
filter(p_ii_sim < 0.05)
tmap_mode("view")
tm_basemap("OpenStreetMap") +
tm_shape(lisa_logWEE) +
tm_fill(alpha=0.6)+
tm_polygons() +
tm_shape(lisa_sig_logWEE) +
tm_fill("mean",
alpha=0.6,
id="LOC_DESC") +
tm_borders(alpha = 0.4)Using the untransformed passenger volume, the bus stops in the West, North-West, and Changi Village regions continue to experience low ridership on weekday and holiday evenings, as illustrated by the clusters of teal hexagons. Bus stops located near transit hubs and around the shopping district in town exhibit high-high autocorrelation (in salmon). Notably, the area around Woodlands MRT station shows significant high-high autocorrelation, which could be attributed to its connectivity with Woodlands Bus Interchange and its status as an interchange serving both the North-South Line and the Thomson-East Coast Line. Interestingly, no high-low outliers (in yellow) are observed in the Central and South regions of Singapore.
More clusters are shown with the log-transformed passenger volume. In particular, Hougang Central, Paya Lebar / Joo Chiat region, Toa Payoh Central, Tiong Bahru and Clementi were not previously highlighted as high-high clusters before.
6 Conclusions
Our analysis of the originating bus stops of passenger trips has yielded several insights into statistically significant clusters and outliers based on the four peak hour periods. Overall, our analysis revealed clearly uneven distributions of bus ridership across Singapore. There is consistent high-high clusters around bus stops near transit hubs and residential neighborhoods across the four identified peak periods. In contrast, peripheral areas, predominantly composed of industrial estates, displayed low-low tendencies. Our analysis has also showed that variable transformation for a data with skewed distribution can deliver more in-dept results.
Future work can consider the following:
Our analysis has solely focused on originating bus stops. A more insightful analysis might emerge from considering the destinations, thus understanding the flow of passengers.
Given the clusters observed around bus stops near transit hubs, further analysis could be conducted in conjunction with data from other modes of public transportation.
It is also crucial to consider the population densities and socio-economic status of each region, as these factors might affect bus ridership compared to other modes of transport.
Exploring the connectivity of available buses at each bus stop might also yield interesting insights. For example, feeder buses within a neighborhood might be more widely used compared to buses traveling long distances to secluded areas with less demand.